Skip to content

fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions#10855

Open
tlangton3 wants to merge 1 commit into
cube-js:masterfrom
tlangton3:cube/fix-tesseract-multistage-reduce-by-time-dim
Open

fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions#10855
tlangton3 wants to merge 1 commit into
cube-js:masterfrom
tlangton3:cube/fix-tesseract-multistage-reduce-by-time-dim

Conversation

@tlangton3

@tlangton3 tlangton3 commented May 11, 2026

Copy link
Copy Markdown
Contributor

Fixes #10854. A bare reduce_by: [<time_dimension>] (or group_by, or the grain: directive's exclude/keep_only) silently does nothing when the query projects that time dimension with a granularity.

Problem

Projected time dimensions are TimeDimensionSymbols whose full_name carries a granularity suffix (created_at_day, created_at_month, ...), and the suffix survives the whole reference chain. A bare grain entry compiles to a plain DimensionSymbol (orders.created_at), so the full_name-based matching in the planner never matches it — the entry is silently ignored, the time dimension stays in (or vanishes from) the partition, and window aggregates collapse to one-row partitions. Sums per day instead of running totals, ranks of 1 everywhere — no error, no warning.

This bites at two sites: the window path's PARTITION BY computation, and the JOIN-model path's leaf-grain shaping (partition_filter), which carried a FIXME warning the two copies would drift if only one was updated.

Fix

The granularity-aware matching now lives on MultiStageGrain itself (partition_dimensions()), and both sites delegate to it — removing the duplicated body and discharging that FIXME. The matching rule:

  • an entry with an explicit granularity (created_at.month) matches strictly on full_namegroup_by: [created_at.month] keeps only the month projection, not sibling granularities;
  • a bare entry matches the dimension at any granularity, by unwrapping both sides to their base symbol.

The asymmetry is deliberate: an earlier draft that unwrapped unconditionally broke queries projecting multiple granularities of the same time dimension, which the multi-granularity test below now pins.

Because the legacy reduce_by/group_by and the new grain: directive compile through the same grain construction, the fix covers both spellings.

Testing

Four integration tests (executed against postgres via testcontainers, result-level snapshots):

  • reduce_by_time_dim — bare reduce_by on the window path; fails without the fix (per-month splits instead of per-status totals)
  • reduce_by_time_dim_join_model — same bug via a type: max measure, which is not window-eligible and exercises the partition_filter site; fails without the fix
  • grain_exclude_time_dim_join_model — bare time-dim exclude spelled via the grain: directive with a non-empty include (forces the JOIN-model path); fails without the fix
  • group_by_time_dim_multiple_granularities — guard test pinning the strict-match behaviour for explicit granularities; passes with and without the fix by design

Two adjacent issues are deliberately out of scope and can be follow-ups: granularity-qualified entries in filter directives (filter: exclude: [created_at.month]) match nothing for date-range filters, and grain.include of a bare time dimension can add a duplicate grouping next to an already-projected granularity.

@github-actions github-actions Bot added rust Pull requests that update Rust code pr:community Contribution from Cube.js community members. labels May 11, 2026
@ovr ovr requested a review from waralexrom May 11, 2026 15:19
@tlangton3 tlangton3 changed the title fix(tesseract): multi_stage reduce_by/group_by no-ops on time dimensions fix(tesseract): multi_stage reduce_by no-ops on type:time dimensions May 11, 2026
@tlangton3 tlangton3 marked this pull request as ready for review May 11, 2026 15:51
A bare reduce_by/group_by (or grain exclude/keep_only) entry referencing
a time dimension never matched the projected TimeDimensionSymbol, whose
full_name carries a granularity suffix (_day, _month, ...). The entry
was silently ignored: the time dimension stayed in (or vanished from)
the window PARTITION BY / the JOIN-model leaf grain, collapsing window
aggregates to one-row partitions.

The granularity-aware matching lives on MultiStageGrain itself
(partition_dimensions): entries with an explicit granularity match
strictly on full_name; bare entries match across granularities by
unwrapping both sides to their base symbol. Both consumers — the window
path's member_partition_by_logical and the JOIN-model's partition_filter
— now delegate to it, removing the duplicated body and discharging the
FIXME about the two copies drifting.

Fixes cube-js#10854.
@tlangton3 tlangton3 force-pushed the cube/fix-tesseract-multistage-reduce-by-time-dim branch from fe7d04e to 640bc61 Compare June 10, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:community Contribution from Cube.js community members. rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tesseract: multi-stage reduce_by silently no-ops against type:time dimensions

1 participant